Second order optimality in Markov decision chains
نویسندگان
چکیده
منابع مشابه
Second Order Optimality in Transient and Discounted Markov Decision Chains
Abstract. The article is devoted to second order optimality in Markov decision processes. Attention is primarily focused on the reward variance for discounted models and undiscounted transient models (i.e. where the spectral radius of the transition probability matrix is less then unity). Considering the second order optimality criteria means that in the class of policies maximizing (or minimiz...
متن کاملOn the spectral analysis of second-order Markov chains
Second order Markov chains which are trajectorially reversible are considered. Contrary to the reversibility notion for usual Markov chains, no symmetry property can be deduced for the corresponding transition operators. Nevertheless and even if they are not diagonalizable in general, we study some features of their spectral decompositions and in particular the behavior of the spectral gap unde...
متن کاملEmpirical Bayes Estimation in Nonstationary Markov chains
Estimation procedures for nonstationary Markov chains appear to be relatively sparse. This work introduces empirical Bayes estimators for the transition probability matrix of a finite nonstationary Markov chain. The data are assumed to be of a panel study type in which each data set consists of a sequence of observations on N>=2 independent and identically dis...
متن کاملBias Optimality for Multichain Markov Decision Processes
In recent research we find that the policy iteration algorithm for Markov decision processes (MDPs) is a natural consequence of the performance difference formula that compares the difference of the performance of two different policies. In this paper, we extend this idea to the bias-optimal policy of MDPs. We first derive a formula that compares the biases of any two policies which have the sa...
متن کاملMarkov Chains and Mixing Times, second edition
Unlike most books reviewed in the Intelligencer this is definitely a textbook. It assumes knowledge one might acquire in the first two years of an undergraduate mathematics program – basic mathematical probability, plus linear algebra, a little graph theory and the infamous concept of “mathematical maturity”. It has the theorem-proof style of pure mathematics, but with friendly explanations of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Kybernetika
سال: 2018
ISSN: 0023-5954,1805-949X
DOI: 10.14736/kyb-2017-6-1086